perm filename REWRIT.PUB[L70,TES] blob sn#025977 filedate 1973-02-20 generic text, type T, neo UTF8
COMMENT ⊗   VALID 00013 PAGES 
RECORD PAGE   DESCRIPTION
 00001 00001
 00005 00002	.PUBLISH ← FALSE 
 00008 00003	.SEC BACKGROUND
 00013 00004	.SEC PATTERN-DIRECTED COMPUTATION
 00017 00005	.SEC LIST STRUCTURE TRANSFORMATIONS
 00020 00006	.SEC REPLACEMENT
 00022 00007	.SEC EXTENSIBLE FUNCTIONS
 00024 00008	.SEC THE EXTENSIBLE COMPILER
 00028 00009	.SEC AUTOMATIC ORDERING OF REWRITE RULES
 00032 00010	.SEC AN ORDERING FUNCTION
 00035 00011	.SEC OTHER FACILITIES AND APPLICATIONS
 00037 00012	.SEC CONCLUSIONS
 00040 00013	.<< REFERENCES >>
 00044 ENDMK
⊗;
.PUBLISH ← FALSE ;
.BIGPAGE ← (SPREAD=2)
.SPREAD ← 1
.IF BIGPAGE THEN START
.DEVICE TTY
.PAGE FRAME 66 HIGH 80 WIDE
.AREA HEADING LINES 1 TO 3  CHARS 12 TO 80
.AREA TEXT    LINES 6 TO 66 CHARS 12 TO 80
.COUNT PAGE FROM 0
.NEXT PAGE
.END
.GROUP SKIP 10
.BEGIN CENTER
THE LISP70 PATTERN MATCHING SYSTEM

Lawrence G. Tesler
Horace J. Enea
David C. Smith

Stanford University

February, 1973
.END
.MACRO SEC(NAME) ⊂NEXT PAGE; ONCE NOFILL
NAME
.SKIP 2⊃
.MACRO APP(TAG,NAME) ⊂NEXT PAGE; TAG: NEXT APPENDIX!; ONCE NOFILL
APPENDIX {APPENDIX!}: NAME
.SKIP 2⊃
.MACRO B ⊂SKIP 1; BEGIN INDENT 10; GROUP ; NOFILL ; SINGLE SPACE ⊃
.MACRO LONG ⊂SKIP 1; BEGIN INDENT 10; NOFILL ; SINGLE SPACE ⊃
.MACRO E ⊂END⊃
.MACRO EC ⊂ END CONTINUE ⊃
.MACRO EB ⊂ E ; B ⊃
.REQUIRE "REF1.PUB" SOURCE_FILE
.COUNT APPENDIX PRINTING "I"
.TURN ON "↓_α#∪{" ;
.NOJUST
.INDENT 5
.EVERY HEADING({PAGE!},,|Tesler, Enea, and Smith|)
.SKIP 6
ABSTRACT

LISP70 is a descendant of LISP which
emphasizes pattern-directed computation and extensibility.
A function can be defined
by a set of pattern rewrite rules as well as by the normal LAMBDA method.
New rewrite rules can be added to a previously defined function; thus
LISP70 functions are said to be "extensible".  It is possible to have the
new rules merged into the function automatically so that special cases are
checked before general cases.  Some of the facilities of the rewrite system are
described and a variety of applications are demonstrated.

.IF NOT PUBLISH THEN START

NOTICE

This is a limited circulation draft of a paper to be submitted to a
conference or journal.  No right to publicize its contents is granted.
.END
.BREAK
.DOUBLE SPACE
.PREFACE 2
.SEC BACKGROUND

During the past decade,
LISP#[{REF MCCARTHY_LISP}] has been a principal programming language for artificial
intelligence and other frontier applications of computers.
Like other widely used languages, it has spawned many variants,
each attempting to make one or more improvements.
Among the aspects that have received particular attention are
notation#[{ref ABRAHAMS_LISP2},{ref ENEA_MLISP},{ref LANDIN_ISWIM},{ref SMITH_MLISP}],
control structure#[{REF BURSTALL_POP2},{ref HEWITT_THESIS},{REF RULIFSON_QA4},{REF SMITH_BACK}],
data base management#[{REF HEWITT_PLANNER},{REF RULIFSON_QA4},{REF SUSSMAN_CONNIVER}],
interactive editing and debugging#[{ref TEITELMAN_DWIM}],
and execution efficiency.

A need for a successor to LISP has been recognized#[{ref BOBROW_SUCCESSOR}], and
several efforts in this direction are under way.  The approach being taken with
TENEX-LISP is to begin with an excellent debugging system#[{REF TEITELMAN_BBNLISP}]
and to add on flexible control structure#[{ref BOBROW_CONTROL}] and Algol-like
notation.
The approach taken by LISP70 and by the LISP-related
ECL#[{REF WEGBREIT_ECL}] is to begin with an extensible kernel language which
users can tailor and tune to their own needs.

"Tailoring" a language means defining facilities which assist in the solution
of particular kinds of problems which may have been unanticipated by the
designers of the kernel.  "Tuning" a language means specifying more efficient
implementations for statements which
are executed frequently in particular programs.

A language that can be used on only one computer is not of universal utility;
the ability to transfer programs between computers increases its value.
However, a
language that is extensible both upward and downward
is difficult to transport if downward extensions mention
machine-dependent features#[{REF DICKMAN_ETC},{REF DUBY_EXT}].
LISP70 generates code for an
"ideal LISP machine" called "ML" and only the translation from ML to object
machine language is machine-dependent.  Thus, downward extensions can be
factored into a machine-independent and a machine-dependent part, and during
program transfer, the machine-dependent recoding (if any) is clearly isolated.

Among the improvements LISP70 makes to LISP are
backtrack control structure#[{REF SMITH_BACK}],
streaming#[{REF LEAVENWORTH_STREAMING}],
pattern-directed computation, and
extensible functions.

The subjects to be covered in the present paper are
pattern-directed computation and extensible functions.
.SEC PATTERN-DIRECTED COMPUTATION

Many of the data tranformations performed in LISP applications are more
easily described by pattern matching rules than by
algorithms#[{REF HEWITT_PLANNER},{REF RULIFSON_QA4},{REF SUSSMAN_CONNIVER},{REF TEITELMAN_FLIP}].
In addition,
pattern matching rules are appropriate for the description of
input-output conversion, parsing, and compiling#[{REF SMITH_EXT}].
LISP70 places great emphasis on "pattern rewrite
rules"#[{REF COLBY_DOC},{REF COLBY_WATT},{REF KAY_FLEX},{REF WEIZENBAUM_ELIZA}]
as an alternative and adjunct to algorithms as a means of defining functions.

A brief explanation of rewrite rule syntax and semantics will be
presented with some examples to demonstrate the clarity of the notation.

Each rule is of the form DEC→REC.  The DEC (decomposer) is matched against
the "input stream".  If it matches, then the REC (recomposer) generates
the "output stream".

A literal in a pattern is represented by itself
if it is an identifier or number, or preceded by a quote (') if it is
a special character.
.B
RULES OF SQUARE =
	2 → 4,
	5 → 25,
	12 → 144 ;
.E
A private variable
of the rule is represented by an identifier prefixed by a colon (:);
it may be bound to only one value during operation of the rule.
.B
RULES OF EQUAL =
	:X :X → T,
	:X :Y → NIL ;
.E
A list is represented by a pair of parentheses
surrounding the representations of its elements.
A segment of zero or more elements is represented by an ellipsis symbol (α.α.α.).
.B
RULES OF CAR =
	(:X ...) → :X ;
.EB
RULES OF CDR =
	(:X ...) → (...) ;
.EB
RULES OF CONS =
	:X (...) → (:X ...) ;
.EB
RULES OF ATOM =
	(:X ...) → NIL,
	:X → T ;
.EB
RULES OF APPEND =
	(...) (...) → (... ...) ;
.E

If a segment needs a name, it is represented by an identifier prefixed by
a double-colon (::).
.B
RULES OF ASSOC =
	:X (... (:X ::Y) ...) → (:X ::Y),
	:X (...) → NIL ;
.E
A function F can be called in a pattern, passing it a single
argument ARG, by the construct: ARG@F
(there are also ways of passing several arguments to a function).
.B
RULES OF LENGTH =
	( )	  →  0,
	(:X ...)  →  (...) @LENGTH @ADD1 ;
.E
.SEC LIST STRUCTURE TRANSFORMATIONS

The following set of rules defines a function
MOVE_BLOCK of three arguments: a block to be moved,
a location to which it should be moved, and a representation of the current
world.  The function moves block :B from its current location in the world to
location :TO, and the transformed representation of the world is returned.
.B
RULES OF MOVE_BLOCK =

	:B :TO (... (:TO ... :B ...) ...)
	     → (... (:TO ... :B ...) ...),

	:B :TO (... (... :B ...) ... (:TO ...   ) ...)
	     → (... (...    ...) ... (:TO ... :B) ...),

	:B :TO (... (:TO ...   ) ... (... :B ...) ...)
	     → (... (:TO ... :B) ... (...    ...) ...),

	:B :TO (... (... :B ...) ...)
	     → (... (...    ...) ... (:TO :B)),

	:B :TO (...)
	     → (BLOCK :B NOT IN (...)) @ERROR ;
.EC
In the first case, the block is already where it belongs, so the world does
not change;
in the second, the block is moved to the right; in the third, to the left;
in the fourth, the location :TO does not exist yet and is created; in
the last case, :B is not in the world and the ERROR routine is called.

Functions such as MOVE_BLOCK have been used in a simple planning program written by
one of the authors.
Imagine writing MOVE_BLOCK as an algorithm; it would require the use of
auxiliary functions or of a PROG with state variables and loops.  Bugs would
be more likely in the algorithm because its operation would not be so transparent.
.SEC REPLACEMENT

If F is any function, then the
construct <F> occurring in a DEC pattern signifies "replacement".
This means that
F is invoked to translate a substream of the input stream, and
that substream is replaced by its translation.  The altered input stream can
then continue to be matched by the pattern to the right of <F>.

The following example is from the MLISP compiler, which calls itself
recursively to translate the condition and arms of an IF-statement to LISP:
.B
RULES OF MLISP =

	IF <MLISP>:X THEN <MLISP>:Y ELSE <MLISP>:Z
		→ (COND (:X :Y) (T :Z)),

	IF <MLISP>:X THEN <MLISP>:Y
		→ (COND (:X :Y) (T NIL)),

	IF <MLISP>:X
		→ (MISSING THEN) @ERROR,

	IF	→ (ILLEGAL EXPRESSION AFTER IF) @ERROR ;
.E
Here is another example.  The predicate PALINDROME is true iff the input
stream is a mirror image of itself, i.e., if the left and right ends are equal
and the middle is itself a palindrome.
.B
RULES OF PALINDROME =	:X	→	T,

		      :X :X	→	T,

	:X <PALINDROME>T :X	→	T,

			...	→	NIL ;
.E
The replacement facility provides both the non-terminal symbols
of syntactic parsing and the "actors" of PLANNER#[{REF HEWITT_PLANNER}].
.SEC EXTENSIBLE FUNCTIONS

New rules may be added to an existing set of rewrite rules under program control;
thus, any compiler table or any other system of rewrite rules can be extended
by the user.  For this reason, a set of rewrite rules is said to be an
"extensible function".  The "ALSO" clause is used to add cases to an extensible
function:
.B
RULES OF MLISP ALSO =

	IF <MLISP>:X THEN <MLISP>:Y ELSE
		→ (MISSING EXPRESSION AFTER ELSE) @ERROR,

	IF <MLISP>:X THEN
		→ (MISSING EXPRESSION AFTER THEN) @ERROR ;
.E
Extensions can be made effective throughout the program or only in the current
block, as the user wishes.

A regular LAMBDA function can also be extended.
Its bound variables are considered
analogous to a DEC and its body analogous to a REC.  Accordingly, the compiler
converts it to an equivalent rewrite function of one rule before extending it.
.SEC THE EXTENSIBLE COMPILER

To make an extensible compiler practical, the casual user must be able to
understand how it works in order to change it.  To demonstrate that it is not
inordinately difficult to understand the LISP70 compiler, those rules which
get involved in translating a particular statement from MLISP to LAP/PDP-10 are
shown below.  A simplified LISP70 (typeless and unhierarchical)
is used in the examples, but the real thing is not much more complicated.
The statement to be translated is:
.B
	IF A < B THEN C ELSE D
.EC
The rules invoked in the MLISP-to-LISP translator are:
.B
RULES OF MLISP =

	IF <MLISP>:X THEN <MLISP>:Y ELSE <MLISP>:Z
			→ (COND (:X :Y) (T :Z)),

	:X  '<  :Y	→	(LESSP :X :Y),

	:VAR		→	:VAR ;
.E
The LISP-to-ML compiler below utilizes the following feature: if a colon variable
occurs in the REC but it did not occur in the DEC, an "existential value"
(which is something like a generated symbol) is bound to it.
Here, the existential value is used as a compiler-generated label.
.B
RULES OF COMPILER =

	(COND (T :E))
	      →	:E @COMPILER,

	(COND (:B :E) ...)
	      →	:B @COMPILER
		(BRANCH_FALSE :ELSE)
		:E @COMPILER
		(BRANCH :OUT)
		(LABEL :ELSE)
		(COND ...) @COMPILER
		(LABEL :OUT),

	(LESSP :A :B)
	      →	:A @COMPILER
		(PUSH_DOWN)
		:B @COMPILER
		(COMPARE LESS),

	:V    →	(LOAD :V) ;
.E
The ML-to-LAP translator below assumes that the value register
of the ideal machine is represented on the PDP-10 by a
register named "VAL", that there is
a single stack based on register "P", and that variables can be
accessed from fixed locations in memory.
.B
RULES OF ML =

	(BRANCH_FALSE :LBL)
	      →	(JUMPE VAL :LBL),

	(BRANCH :LBL)
	      →	(JRST :LBL),

	(LABEL :LBL)
	      →	:LBL,

	(PUSH_DOWN)
	      → (PUSH P VAL),
#
	(COMPARE LESS)
	      →	(CAMGE VAL 0 P)
		(TDZA VAL VAL)
		(MOVEI VAL 1)
		(POP P),
#		
	(LOAD :V)
	      → (MOVE VAL :V) ;
.E
The code generated is thus:
.B
	(MOVE VAL A)
	(PUSH P VAL)
	(MOVE VAL B)
	(CAMGE VAL 0 P)
	(TDZA VAL VAL)
	(MOVEI VAL 1)
	(POP P)
	(JUMPE VAL E0001)
	(MOVE VAL C)
	(JRST E0002)
E0001	(MOVE VAL D)
E0002
.EC
Peephole optimization guided by another rewrite function
can reduce this to six instructions.
.SEC AUTOMATIC ORDERING OF REWRITE RULES

In most pattern matchers, candidate patterns to match an input stream are
tried either in order of appearance on a list or in an essentially random
order not obvious to the programmer.  LISP70 tries matches in an order
specified by an "ordering function" associated with each set of rewrite
rules.

One common ordering is "BY APPEARANCE", which is appropriate when the
programmer wants conscious control of the ordering.  Another is
"BY SPECIFICITY", which is useful in left-to-right parsers and other
applications where the compiler can be trusted to order the rules so that
more specific cases are tried before more general ones.  When neither of
these standard functions is appropriate, the programmer can define and use
specialized ordering functions, or can extend SPECIFICITY to meet the
special requirements.

Automatic ordering is a great convenience for a user who is extending
a compiler, a natural language parser, or an inference system.  It can
eliminate the need to study the existing rules simply to determine where
to position a new rule.  Ordering functions can also be designed to detect
inconsistencies and ambiguities and to discover opportunities for
generalization of similar rules.

As an example, take the LISP-TO-ML translator "COMPILER",
which includes the following rule for the intrinsic function PLUS (slightly
simplified for presentation):
.B
RULES OF COMPILER =

	(PLUS :X :Y)
	      →	:X@COMPILER
		(PUSH_DOWN)
		:Y@COMPILER
		(ARITHMETIC ADD) ;
.E
To add special cases to the compiler for sums including the constant zero,
the user could include the following declaration in a program:
.B
RULES OF COMPILER ALSO =

	(PLUS :X 0) → :X@COMPILER,

	(PLUS 0 :X) → :X@COMPILER ;
.E
The compiler is ordered by SPECIFICITY, which knows that the
literal 0 is more specific than the variable :X or :Y.
Therefore, both of the new rules would be ordered before the original
PLUS rule.
Suppose the added rules were placed after the general rule;
then the original rule would get first crack at every input stream,
and sums with zero would not be processed as special cases.
.SEC AN ORDERING FUNCTION

The complete definition of the ordering function SPECIFICITY
is beyond the scope of this paper.  It works roughly as follows.
Comparing DEC patterns by a left-to-right scan, it considers
literals more specific than variables, a colon variable at its second
occurrence more specific than one at its first occurrence, and a function
call with an "@" more specific than a variable but less specific than
a literal.  The specificity of a replacement <F>
is that of the most general rule in the function F.

A DEC with an ellipsis
is considered to expand to multiple rules in which the ellipsis
is replaced by 0, 1, 2, 3, ... ∞ consecutive variables.  The specificity of
each expanded rule is considered separately.  Observe that between
two expansions of an elliptic rule some other rewrite rule of intermediate
specificity may lie.  Example:
.B
RULES OF SILLY =
	A ... B ... C	→	1,
	A B :X :Y	→	2 ;
.EC
Two of the expansions of the first rule are:
.B
	A B :X C	→	1,
	A :Z B C	→	1,
.EC
and the second rule of SILLY comes between these in specificity.

SPECIFICITY is itself defined by a system of rewrite rules.
To give a flavor of how this is done, a very simplified SPECIFICITY will be defined.
It takes two arguments (DEC patterns translated to LISP notation) and
returns them in the proper order.
.B
RULES OF SPECIFICITY =

	(COLON :V) (LITERAL :L)
		→ (ORDER (LITERAL :L) (COLON :V)),

	(LITERAL :L) (COLON :V)
		→ (ORDER (LITERAL :L) (COLON :V)) ;
.E
.SEC OTHER FACILITIES AND APPLICATIONS

Other facilities of the rewrite system include side-conditions, conjunctive match,
disjunctive match, non-match, repetition, evaluation of LISP and MLISP expressions,
look-ahead, look-behind, and reversible rules.

Out of rewrite functions, it is easy to
define systems of inference rules, assertions, and beliefs.
LISP70 has facilities for retrieving either all or the first of the assertions
in a set of rules that match a given pattern.

Rewrite rules are a great help in natural language understanding,
whether the methods used
are based on phrase structure grammar, features, keywords, or word patterns.
A use of LISP70 with the latter method is described in a companion
paper#[{REF ENEA_IDIOLECT}].
.SEC CONCLUSIONS

Many of the design decisions of LISP70 are contrary to trends seen in other
"successors to LISP".  The goals of these languages are similar, but their
means are often quite diverse.

Concern with good notation does not have to compromise the development
of powerful facilities; indeed, good notation can make those facilities
more convenient to use.

Emphasis on pattern-directed computation does not overly constrain the
programmer accustomed to writing algorithms.  Rewrites and algorithms can
be mixed, and the most appropriate means of defining a given function can be
selected.

LISP70 does not limit the use of pattern rewrite rules to
a few facilities like goal-achievement and assertion-retrieval.  A set of
rules can be applied to arguments like any other function, and can
stream data from any type of structure or process to any other.

Automatic ordering does not prevent the programmer from seizing control,
but allows him to relinquish control to a procedure of
his choosing to save him tedious study of an existing program when making
extensions.

The LISP70 kernel is being debugged on a PDP-10
at the time of this writing (February, 1973).
The language has already been used successfully
in programs for question-answering and planning.
After the kernel can reliably compile itself, extensions are planned to
improve its control structure, editing, and debugging capabilities, and
versions may be bootstrapped to other computers.
.<< REFERENCES >>
.BREAK
.SINGLE SPACE
.PREFACE 1
.REQUIRE "REF2.PUB" SOURCE_FILE